packages = c('sf', 'tidyverse', 'spdep', 'tmap', 'funModeling', 'kableExtra')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}Take-Home Exercise 01
Water Points in Nigeria
The purpose of this study is to understand the spatial patterns of functional and non-functional water points in Nigeria.
1. Setting Up
Loading Packages
We will use the following packages:
sf: import geospatial datasetstidyverse: manipulate aspatial dataspdep: compute spatial weights and autocorrelationtmap: plot mapsfunModeling: quick exploratory data analysis
Loading and Preparing Waterpoint Dataset
The water point data is collected by the Water Point Data Exchange (WPdx) whose goal is to improve water access to rural communities by providing data to enable data-driven decision making. The dataset can be found here and the data dictionary here. The data is in csv format with latitude and longitude information.
wp <- read_csv("data/WPdx_plus_Nigeria.csv")Rows: 95008 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): #status_clean
dbl (2): #lat_deg, #lon_deg
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Due to the size of the dataset, it has already been pre-processed to keep only entries in Nigeria and some unused variables have been removed. The following code was used to pre-process the raw data file “WPdx_plus_full.csv” from the website but is not run on this page (raw data file is also not found on GitHub).
wp <- read_csv("data/WPdx_plus_full.csv") %>%
filter(`#clean_country_name`=="Nigeria") %>%
select(c(3:4, 22)) %>%
write_csv("data/WPdx_plus_Nigeria.csv")glimpse(wp)Rows: 95,008
Columns: 3
$ `#lat_deg` <dbl> 7.980000, 6.964532, 6.486940, 6.727570, 6.779900, 6.95…
$ `#lon_deg` <dbl> 5.120000, 3.597668, 7.929720, 7.648670, 7.664850, 7.77…
$ `#status_clean` <chr> NA, "Functional", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
We can see that the variable names have some special characters which is not ideal. There are also a lot of NA values in the variable of interest (#status_clean). The following code chunk cleans the variable names and replaces the NA values with “Unknown”.
wp <- wp %>%
rename_with(~str_replace(.x, "#", "")) %>%
mutate(status_clean=replace_na(status_clean, "Unknown"))Let’s check the values of status_clean.
wp %>%
group_by(status_clean) %>%
summarise(n=n()) %>%
ungroup() %>%
kable() %>%
kable_styling()| status_clean | n |
|---|---|
| Abandoned | 175 |
| Abandoned/Decommissioned | 234 |
| Functional | 45883 |
| Functional but needs repair | 4579 |
| Functional but not in use | 1686 |
| Non-Functional | 29385 |
| Non-Functional due to dry season | 2403 |
| Non functional due to dry season | 7 |
| Unknown | 10656 |
The categories are more detailed than we need to study the proportion of functional and non-functional waterpoints. The following code chunk collapses the categories into 3 categories: “Functional”, “Nonfunctional” and “Unknown”.
wp <- wp %>%
mutate(status = case_when(
status_clean %in% c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional",
"Non functional due to dry season",
"Non-Functional due to dry season") ~ "Nonfunctional",
status_clean == "Unknown" ~ "Unknown",
status_clean %in% c("Functional",
"Functional but not in use",
"Functional but needs repair") ~ "Functional"
))Let’s visualise the proportions of functionality of the waterpoints. On the whole, only 55% of waterpoints are functional.
freq(wp, input="status")Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.

status frequency percentage cumulative_perc
1 Functional 52148 54.89 54.89
2 Nonfunctional 32204 33.90 88.79
3 Unknown 10656 11.22 100.00
Now, I convert the aspatial data into geospatial point data from the latitude and longitude variables using the st_as_sf() function. The GCS of the data is WGS1984 (EPSG:4326) as stated in the data dictionary.
wp_sf <- st_as_sf(wp,
coords = c("lon_deg", "lat_deg"),
crs=4326) Loading Administrative Boundary Data
I will also use the Nigeria Level-2 Administrative Boundary (also known as Local Government Area) polygon dataset from geoBoundaries.
adm_bound <- st_read(dsn="data",
layer="geoBoundaries-NGA-ADM2")Reading layer `geoBoundaries-NGA-ADM2' from data source
`D:\lins-92\ISSS624\Take-home_EX01\data' using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
glimpse(adm_bound)Rows: 774
Columns: 6
$ shapeName <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ Level <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ shapeID <chr> "NGA-ADM2-72505758B79815894", "NGA-ADM2-72505758B67905963",…
$ shapeGroup <chr> "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NG…
$ shapeType <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((7.401109 5...., MULTIPOLYGON (…
Let’s check what the Nigeria Level-2 Administrative Boundary and water points data looks like.
tmap_mode("plot")tmap mode set to plotting
tm_shape(adm_bound) +
tm_polygons() +
tm_text("shapeName", size=0.2) +
tm_shape(wp_sf) +
tm_symbols(size=0.1)
2. Status of Waterpoints
Now I have 2 geospatial datasets: a point dataset with waterpoint locations and a polygon data with administrative boundaries. I still need to count the number of points by status for each administrative area.
I will use the st_join() function to do a spatial join to relate the polygon IDs to each waterpoint by its location (note that the shapeName variable contains duplicates and should not be used for this step). The join=st_intersects() argument tells R the type of spatial join to use. Note that both datasets must have the same projection (WGS1984), which is why we have not transformed either dataset yet.
wp_named <- st_join(x = wp_sf,
y = adm_bound,
join = st_intersects,
left = TRUE)Next, we need check if there are any missing values.
sum(is.na(wp_named$shapeID))[1] 29
We can plot the points to visually check why these points are missing polygon IDs. Most likely it is because they fall outside any administrative area. If so, we can safely ignore these points.
Setting tmap_mode("view") creates an interactive plot so we can zoom in to check the points. In addition, tm_dots() is used instead of tm_shape() this time so that the size of each point scales dynamically when zooming in on the interactive map.
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(adm_bound) +
tm_polygons() +
tm_shape(filter(wp_named, is.na(shapeID))) +
tm_dots(size=0.1,
col="red")We can see that these 29 points fall outside the boundary of Nigeria so we can exclude them.
Now let’s extract the number of waterpoints by status in each administrative boundary and join it to the administrative boundary polygon layer. First, I need to remove the geometry data using the st_drop_geometry() function to manipulate it like a regular dataframe using tidyr and dplyr functions.
The next step is to group by administrative area name and status to generate the count. Lastly, we pivot from long to wide format for joining with the administrative boundary dataset such. The values_fill=0 argument replaces any na values in the values_from variable with 0.
prop <- wp_named %>%
st_drop_geometry() %>%
group_by(shapeID, status) %>%
summarise(n=n()) %>%
ungroup() %>%
pivot_wider(id_cols=shapeID,
names_from=status,
values_from=n,
values_fill=0)`summarise()` has grouped output by 'shapeID'. You can override using the
`.groups` argument.
head(prop, n=5) %>%
kable() %>%
kable_styling()| shapeID | Functional | Nonfunctional | Unknown |
|---|---|---|---|
| NGA-ADM2-72505758B10049836 | 31 | 39 | 0 |
| NGA-ADM2-72505758B10063467 | 50 | 38 | 30 |
| NGA-ADM2-72505758B10065661 | 104 | 51 | 23 |
| NGA-ADM2-72505758B10302610 | 64 | 53 | 84 |
| NGA-ADM2-72505758B11317593 | 51 | 49 | 0 |
Now, we use left_join() to join the counts to the administrative boundary geospatial data. As this is the final dataset we will be working on, we can transform the projection to EPSG:26391. We also need to replace any na counts with 0. There are still na counts in this step because some polygons may not have any waterpoints and would not have been addressed in the previous step. Lastly, we add a new variable for total number of waterpoints.
adm_wp <- left_join(x=adm_bound,
y=prop,
by="shapeID") %>%
mutate(across(c(6:8), ~replace_na(.x, 0))) %>%
mutate(Total = Functional + Nonfunctional + Unknown) %>%
st_transform(crs = 26391)Absolute Number of Waterpoints
Finally, we can plot the number of waterpoints by status in each administrative area.
tmap_mode("plot")tmap mode set to plotting
total <- qtm(adm_wp, "Total")
func <- qtm(adm_wp, "Functional")
nonfunc <- qtm(adm_wp, "Nonfunctional")
unknown <- qtm(adm_wp, "Unknown")
tmap_arrange(total, func, nonfunc, unknown,
asp=1, ncol=2, nrow=2)
The distribution of waterpoints across Nigeria does not appear to be evenly distributed. There are some small administrative areas with many waterpoints in the central north. Based on the distribution of waterpoints of unknown status, we can infer that the north of Nigeria is likely more developed because there are fewer waterpoints of unknown status; likewise, central Nigeria may not be as developed because there is a higher number of waterpoints of unknown or non-functional status.
The next 3 code chunks plot waterpoints using quantile breaks. We can also add a histogram to view the distribution of total waterpoints.
tm_shape(adm_wp)+
tm_polygons("Total",
style="quantile",
palette="RdBu",
legend.hist=TRUE) +
tm_layout(main.title="Total Waterpoints in Nigeria",
main.title.size=1.1,
title.snap.to.legend=FALSE,
legend.outside=TRUE,
legend.hist.width = 1.1)
From this map, we can see that the north-east and south of Nigeria tend to have fewer waterpoints. More than 60% of administrative areas have less than 200 waterpoints. As we do not know the population or water demand of each administrative area, it is difficult to say which areas are water stressed or need additional water infrastructure.
tm_shape(adm_wp)+
tm_polygons("Functional",
style="quantile",
palette="RdBu",
legend.hist=TRUE) +
tm_layout(main.title="Functional Waterpoints in Nigeria",
main.title.size=1.1,
title.snap.to.legend=FALSE,
legend.outside=TRUE,
legend.hist.width = 1.1)
tm_shape(adm_wp)+
tm_polygons("Nonfunctional",
style="quantile",
palette="-RdBu",
legend.hist=TRUE) +
tm_layout(main.title="Non-Functional Waterpoints in Nigeria",
main.title.size=1.1,
title.snap.to.legend=FALSE,
legend.outside=TRUE,
legend.hist.width = 1.1)
From the maps above, it would seem that areas with many waterpoints tend to have many functional and non-functional waterpoints as well. As such, to assess the state of maintenance of waterpoints in each administrative area, it would be better to map the proportion of functional and non-functional waterpoints out of the total number of waterpoints.
Proportion of Waterpoints
First, we need to generate new variables for proportions. There are some administrative areas without any waterpoints which will result is na values for the proportions. We do not replace these na values with 0 because it will affect the subsequent analysis. We will need to be careful to exclude na values in the subsequent steps. We also should not remove these values because it will affect the neighbourhood structure when conducting spatial analysis.
There are 2 problems with replacing them with 0:
0 values represent a low proportion of functional/non-functional waterpoints. This is inaccurate since there were no waterpoints at all. Using 0 may skew the spatial distribution and clustering analysis.
These areas will appear to have low proportions for both functional and non-functional waterpoints. The relationship between should be negative.
adm_wp <-adm_wp %>%
mutate(pFunctional = Functional/Total,
pNonfunctional = Nonfunctional/Total,
pUnknown = Unknown/Total) The following plot shows the number of non-functional waterpoints out of total waterpoints by administrative area. It is sorted by descending order of proportion of non-functional waterpoints using the reorder() function.
We can see that there are some administrative areas on the left side of the plot with some few waterpoints and most of them are non-functional. Repairs should be focused on such areas with fewer waterpoints and high percentage that are non-functional.
ggplot(adm_wp) +
geom_bar(aes(x=reorder(shapeID, pNonfunctional, decreasing=TRUE),
y=Total,
fill="Total"),
stat="identity") +
geom_bar(aes(x=reorder(shapeID, pNonfunctional, decreasing=TRUE),
y=Nonfunctional,
fill="Non-functional"),
stat="identity",
alpha=0.8) +
scale_fill_manual(name="",
values=c("red", "gray30")) +
labs(title="Number of Water by Administrative Area",
subtitle="(sorted by proportion of non-functional)",
y="Number of waterpoints",
x="Administrative Areas")+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.line.y=element_line(colour="grey50"))
Now let’s plot the proportions spatially.
total <- tm_shape(adm_wp)+
tm_polygons("Total",
style="quantile",
palette="RdBu",
title="")+
tm_layout(main.title="Total waterpoints")
func <- tm_shape(adm_wp)+
tm_polygons("pFunctional",
style="quantile",
palette="RdBu",
title="")+
tm_layout(main.title="Proportion functional")
nonfunc <- tm_shape(adm_wp)+
tm_polygons("pNonfunctional",
style="quantile",
palette="-RdBu",
title="")+
tm_layout(main.title="Proportion non-functional")
unknown <- tm_shape(adm_wp)+
tm_polygons("pUnknown",
style="quantile",
palette="RdBu",
title="")+
tm_layout(main.title="Proportion unknown")
tmap_arrange(total, func, nonfunc, unknown,
asp=1, ncol=2, nrow=2)
We can see that many administrative areas in the north have more waterpoints and a higher proportion of functional waterpoints. Many states in the south part of Nigeria have few waterpoints a high proportion of non-functional waterpoints.
The spatial distribution of waterpoints could be influenced by climate and population distribution (see maps below). The south of Nigeria has a tropical climate (and thus higher rainfall) which could mean less reliance on man-made waterpoints and thus there are fewer waterpoints. On the other hand, many regions in the north of Nigeria have a high proportion of functional waterpoints, possibly because of the high reliance on them due to the arid climate.
Areas with higher population also tend to have more waterpoints.


3. How Are Waterpoints Distributed?
From plotting the total number of waterpoints and the proportion of functional waterpoints, we can visually see that waterpoints may not be evenly distributed across space in Nigeria. To confirm our intuition from visual inspection, we can test it statistically using global and local spatial autocorrelation statistics.
Defining the Neighbourhood
First, we must define the neighbourhood to be considered for each administrative area. There are a number of methods to do this (see In-Class Exercise 1). Contiguity matrices only consider polygons that are immediately adjacent while distance matrices use distance to determine the neighbour. The choice of weight matrix can affect the results of the analysis. For this exercise, I will try 2 methods: inverse-distance contiguity weight matrix and adaptive distance weight matrix.
Contiguity Weight Matrix
First, I create an nb object listing the neighbours of each administrative area. Queen method will be used to identify the adjacent neighbours. From the summary below, we can see that on average each administrative area is contiguous with about 6 other polygons. However, there is 1 administrative area which does not have any contiguous neighbours. It is likely an island. This means that a contiguity matrix is not suitable for this analysis.
wm_q <- poly2nb(adm_wp, queen=TRUE)
summary(wm_q)Neighbour list object:
Number of regions: 774
Number of nonzero links: 4440
Percentage nonzero weights: 0.7411414
Average number of links: 5.736434
1 region with no links:
86
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 14
1 2 14 57 125 182 140 122 72 41 12 4 1 1
2 least connected regions:
138 560 with 1 link
1 most connected region:
508 with 14 links
Adaptive Distance Weight Matrix
An adaptive distance weight matrix sets the fixed number of neighbours for each study area. It is usually used if there is large variation in polygon sizes but we need to set a consistent scale of analysis by considering the same number of neighbours for each area.
First, we need to find the centroids of each polygon. These will be used to determine the distances between polygons to set the neighbourhood.
longitude <- map_dbl(adm_wp$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(adm_wp$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude/1000, latitude/1000)In this exercise, I will fix the number of neighbours at 8 [1]. The knearneigh() function takes the coordinates and finds the 8 nearest neighbours of each polygon. We then convert it to an nb object using knn2nb() function. lastly, nb2listw() generates the spatial weight matrix from the nb object.
k8 <- knn2nb(knearneigh(coords, k=8))
k8Neighbour list object:
Number of regions: 774
Number of nonzero links: 6192
Percentage nonzero weights: 1.033592
Average number of links: 8
Non-symmetric neighbours list
k8_lw <- nb2listw(k8, style="B",
zero.policy=TRUE)Computing Global Moran’s I
The global Moran’s I test is intended to test if the independent variable (total waterpoints, proportion of functional and non-functional waterpoints) is evenly distributed, randomly distributed or clustered. Since we do not know the underlying distribution of waterpoints, we use Monte Carlo simulations (n=1000) to simulate randomly distribution of proportions spatially.
Note that that we must set na.action=na.exclude because there are na values in pFunctional and pNonfunctional. Because we omit these values, some areas may have less than 8 neighbours.
set.seed=123
moran.mc(adm_wp$Total,
listw=k8_lw,
nsim=999,
zero.policy = TRUE,
na.action=na.exclude)
Monte-Carlo simulation of Moran I
data: adm_wp$Total
weights: k8_lw
number of simulations + 1: 1000
statistic = 0.49306, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
The above test shows that there is spatial clustering of waterpoints in Nigeria. The Moran’s I statistic of 0.49 is significant at the 5% significance level. We can reject the null hypothesis (that total waterpoints is randomly distributed spatially) and conclude that there isspatial clustering of waterpoints in Nigeria.
set.seed=123
moran.mc(adm_wp$pFunctional,
listw=k8_lw,
nsim=999,
zero.policy = TRUE,
na.action=na.exclude)
Monte-Carlo simulation of Moran I
data: adm_wp$pFunctional
weights: k8_lw
omitted: 3, 86, 241, 250, 252, 261, 400, 406, 447, 473, 492, 507, 526
number of simulations + 1: 1000
statistic = 0.52454, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
set.seed=123
moran.mc(adm_wp$pNonfunctional,
listw=k8_lw,
nsim=999,
zero.policy = TRUE,
na.action=na.exclude)
Monte-Carlo simulation of Moran I
data: adm_wp$pNonfunctional
weights: k8_lw
omitted: 3, 86, 241, 250, 252, 261, 400, 406, 447, 473, 492, 507, 526
number of simulations + 1: 1000
statistic = 0.44023, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
The computed Moran’s I of 0.52 (p-value=0.001) and 0.44 (p-value=0.001) for the proportions of functional and non-functional waterpoints respectively are significant at the 5% significance level. This indicates that there is some degree of clustering of proportions of functional and non-functional waterpoints.
4. Identifying Clusters and Outliers
Hotspots and coldspots can be detected using the local Moran’s I statistics. Unlike the global Moran’s I test, the local Moran’s I test calculates the test statistics for each observation. Each value measures the extent of significant spatial clustering of similar values around that observation.
The following code chunk conducts the local MI test and saves the result to a dataframe for both proportion of functional and non-functional waterpoints. The Moran’s I statistics and p-values are then joined to the polygon data to plot in a map.
localMI.func <- localmoran(adm_wp$pFunctional,
k8_lw,
na.action=na.exclude,
zero.policy=TRUE)
localMI.func <- data.frame(localMI.func)%>%
select(c(1,5)) %>%
rename(func.Ii = Ii,
func.Pr = Pr.z....E.Ii..)
localMI.nonfunc <- localmoran(adm_wp$pNonfunctional,
k8_lw,
na.action=na.exclude,
zero.policy=TRUE)
localMI.nonfunc <- data.frame(localMI.nonfunc)%>%
select(c(1,5)) %>%
rename(nonfunc.Ii = Ii,
nonfunc.Pr = Pr.z....E.Ii..)
adm_wp.localMI <- cbind(adm_wp, localMI.func, localMI.nonfunc) The local Moran’s score alone is not enough to show spatial clustering because it does not tell us whether the value of the variable being tested (proportion of functional/non-functional waterpoints) is high or low and whether the test result was significant. As such, we assign each observation to a quadrant depending on the value of the variable on the y-axis (centred around the mean) and Moran’s I on the x-axis. Quadrant 1 contains coldspots and quadrant contains hotspots. The following table explains the the quadrants:

The following code create 2 new variables each for proportion of functional and non-functional waterpoints. One for the centered proportion (around the mean) and the quadrant that the observation belong to. Note that we must include the na.rm=TRUE argument when computing mean because our data has na values. We also create a quadrant zero if the test statistic is not significant and the null hypothesis of random distribution cannot be rejected.
adm_wp.localMI <- adm_wp.localMI %>%
mutate(DV.func = pFunctional- mean(pFunctional, na.rm=TRUE)) %>%
mutate(func.quadrant = case_when(
func.Pr >0.05 ~0,
DV.func<0 & func.Ii>0 ~1,
DV.func<0 & func.Ii<0 ~2,
DV.func>0 & func.Ii<0 ~3,
DV.func>0 & func.Ii>0 ~4)) %>%
mutate(DV.nonfunc = pNonfunctional- mean(pNonfunctional, na.rm=TRUE)) %>%
mutate(nonfunc.quadrant = case_when(
nonfunc.Pr >0.05 ~0,
DV.nonfunc<0 & nonfunc.Ii>0 ~1,
DV.nonfunc<0 & nonfunc.Ii<0 ~2,
DV.nonfunc>0 & nonfunc.Ii<0 ~3,
DV.nonfunc>0 & nonfunc.Ii>0 ~4)) Plot LISA cluster map:
pfunc.map <- tm_shape(adm_wp.localMI) +
tm_fill(col = "pFunctional",
style="quantile",
title="Proportion") +
tm_borders(alpha=0.5) +
tm_layout(main.title="Proportion Functional")
pnonfunc.map <- tm_shape(adm_wp.localMI) +
tm_fill(col = "pNonfunctional",
style="quantile",
title="Proportion") +
tm_borders(alpha=0.5) +
tm_layout(main.title="Proportion Non-functional")
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
localMI.func.map <- tm_shape(adm_wp.localMI) +
tm_fill(col = "func.quadrant",
style="cat",
palette =colors,
label=clusters,
title="Quadrant") +
tm_borders(alpha=0.5) +
tm_layout(main.title="LISA Cluster (Functional)")
localMI.nonfunc.map <- tm_shape(adm_wp.localMI) +
tm_fill(col = "nonfunc.quadrant",
style="cat",
palette =colors,
label=clusters,
title="Quadrant") +
tm_borders(alpha=0.5) +
tm_layout(main.title="LISA Cluster (Non-Functional)")
tmap_arrange(pfunc.map, localMI.func.map,
pnonfunc.map, localMI.nonfunc.map,
asp=1, ncol=2, nrow=2)
The LISA cluster maps clearly show that there is clustering in the proportion of functional/non-functional waterpoints. There is a hotspot for functional waterpoints in the central north of Nigeria. There is also a coldspot in the southeast where a custer of regions have low proportion of functional waterpints. This area is an area of concern for maintenance efforts. There are not many outliers.
There is a coldspot of non-functional waterpoints in the central north of Nigeria that overlaps with the hotspot of functional waterpoints. This is logical as the proportions have an negative relationship. If the proportion of functional waterpoints is high, the proportion of non-functional waterpoints should be low.
However, we see a different story in the south where there is a hotspot of non-functional waterpoint areas and no corresponding hotspot for functional waterpoint. This could be due to a difference in the centring of the proportions. From the code chunk below, we see that the mean proportion of non-functioning waterpoints is much lower than that of functional waterpoints (because there are unknowns as well). The threshold to be classified as a high non-functional area is lower than the threshold to be classified as a low functional area.
mean(adm_wp$pFunctional, na.rm=TRUE)[1] 0.5069513
mean(adm_wp$pNonfunctional, na.rm=TRUE)[1] 0.3653858
The 3 hotspot of high proportion of non-functional waterpoints in the south and west of Nigeria are a cause for concern for maintenance efforts.
5. Conclusion
The analysis showed there is clearly uneven distribution of waterpoints in Nigeria, and there is uneven distribution of available (functional) waterpoints. There were significant hotspots with high proportion of non-functional waterpoints which could be indicative of underlying problems in the maintenance regimes. Nonetheless, further analysis should be conducted considering population or water demand to determine water stress and decide on high priority areas to add waterpoints or step up maintenance.